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Audio processing 



The present invention relates to processing audio signals. 

Referring now to Figure 1, in a conventional audio system, a decoder 10 
receives an audio stream AS in which an audio signal (not shown) has been encoded. The 
decoder 10 produces time-domain signals 14 corresponding to successive jfragments of the 
audio signal. For a stereo-encoded audio signal, the decoder produces a pair of, for example, 
mid/side or difference stereo-channel signals 14. It is known to apply post-processing to these 
channel signals to enhance aspects of the signal. So, for example, a post-processor 12 may 
perform stereo widening on the channel signals 14 to produce altered channel signals 16. The 
channel signals 16 are then fed to an audio output system 15 through which the signals are 
played for a listener, or alternatively stored or transmitted. 

In many encoders, including for example MPEG encoders, an audio signal is - 
encoded in a bit stream using a lossy process. It has been found that cascading audio 
decoders (codecs) for such bit streams and post-processmg components can be problematic. 
This is because post-processing a lossy encoded audio fragment can result in unwanted 
audible artefacts due to quantization noise generated in encoding the original audio fragment. 

To prevent degraded audio quality of encoded fragments after post-processing, 
the encoder, the decoder or the post-processor could be modified. However, this woxild 
involve significant re-engineering of existing systems. 

Because a solution to the above problem needs to be implemented in systems 
that apply post-processing to already encoded fragments, it should be noted that the original 
audio fragment from which the bitstream was produced would generally not be available. 

At the same time, before any post-processing changes to a signal are made, the 
quality of the audio signal after post-processing should be known. Although some techniques 
can be found in the literature for objective audio quality measurement, they generally assume 
that the original audio fragment is available. 

Conventional methods, such as cross-correlation don't indicate whether 
quantization noise will be audible or not. Simple experiments have shown that the cross- 
correlation between left and right channels for post-processed mid/side-encoded and 
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difference-encoded stereo fragments are similar, whereas the audio-quality of the post- 
processed firagments of both modes can be completely different 

According to the present invention tiiere is provided an audio system 
according to claim 1. 

5 The present invention provides a system and method for detecting audible 

quantization noise after post-processing without having an original audio fragment available 
and preventing quantization noise becoming audible by adjusting the degree of post- 
processing. 

The invention provides a "blind" objective measurement of a signal i,e. quality 
10 measurement is performed with only the decoded audio jBragment available. The invention 
makes changes in the signal path in a manner that means existing components do not need to 
be modified to implement the invention. 



1 5 Embodiments of the present invention will now be described by way of 

example with reference to the accompanying drawings, in which: 
Figure 1 shows a prior art audio system; 

Figure 2 shows an audio system according to a first embodiment of the present 

invention; 

20 Figures 3(a) and (b) illustrate the degree of quantization noise audible for an 

original signal and a post-processed signal respectively; and 

Figure 4 and 5 illustrate further audio systems according to alternative 
embodiments of the present invention. 

25 

Figure 2 shows an audio system for post-processing encoded audio firagments 
according to a fibrst embodiment of the present invention. First, an encoded audio bit-stream 
AS is decoded in a decoder 10 and afterwards post-processed by a post-processor 12. The 
preferred embodiment is described with reference to an MPEG-1 Layer I decoder in 
30 combination with an Incredible Sound post-processor (described in for example PCT 

Application No. W098/21915 and US PatentNo. 5,742,687) although it will be seen that the 
invenliojnL Is applicable to encoders and post-processors in general. Thus, the decoder 10 
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(Pulse Code Modulated) form and the post-processor 12 performs stereo-widening on the 
channels 14 to produce output channels 16. 

A detector 17 calculates an amount of distortion D for each frame or fragment 
of the audio stream and feeds this measurement to a regulator 18, which determines the 
maximum amoimt of post-processing permitted. In the case of Incredible Soimd, the degree 
of stereo- widening perfomied by the post-processor 12 is determined by a parameter a 
provided by the regulator 18. Thus, the amount of post-processing can be decreased, if 
necessary, by the regulator 18 lowering the value of a supplied to the post-processing unit 
12. 

In the first embodiment the audibility of quantization noise or the degree of 
distortion after post-processing is detected assuming that only the bit-stream for the coded 
fragment is available. The detection method is based on a psycho-acoustic model and the bit- 
allocation procedure used in an encoder during the bit-allocation process. 

A psycho-acoustic model is based on the knowledge that due to the specific . 
behavior of the inner ear, the human auditory system perceives only a small part of the 
complex audio spectrum. Only those parts of the spectrum located above a masking threshold 
of a given sound contribute to its perception. Thus, any acoustic action occurring at the same 
time as a given sound but with less intensity and tiius sitxiated under the masking threshold 
will not be heard because it is masked by the main sound event. The aim of an encoder is to 
lower tiie bit-rate of tiie audio stream as much as possible while keeping the quantization 
noise below the masking threshold. 

In an MPEG encoder, the perceptible part of the axidio signal is extracted by 
splitting the frequency spectrum into 32 equally-spaced sub-bands. In each sub-band, the 
signal is quantized in such a way that the quantizing noise matches or is j\ist below the 
masking threshold* 

However, after post-processing, the noise levels may exceed the masked 
threshold resulting in audible quantization noise. Thus, the detection method of the preferred 
embodiment determines to what extent the noise levels exceed the masked threshold. 

In the first embodiment, the following assumptions are made: 

• the original audio signal firagment is not available, 

• the bit-stream of the coded fiagment (AS) for the audio signal is available, 

• the type of post-processing technique used is known, and 

• the coded fragment is perceptually equal, i.e, it should sound the same, as the original 
fragment. 
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Because the original fragment is not available, the actual error-signal (noise) 
resulting from quantization (the coded fragment minus the original fragment) is also not 
available. However, from a bitstream, mformation can be extracted to determine, for 
example, what type of codec, bit-rate(s) and settings have been used in the encoder to 
generate the bitstream. 

Although it is assumed that the original fragment is not available in the 
preferred embodiment, the original fragment is useful in demonstrating the quality of the 
estimations employed in the preferred embodiments. So, referring to Figure 3(a), the 
frequency spectrum of an original audio fragment is indicated at 22. The line 24 indicates the 
masked threshold for the signal calculated in a conventional manner from the spectrum 22. 

MPEG-1 Layer I uses uniform symmetric mid-tread quantizers. If the input 
range of the quantizer is [-1 ,+1], then the step size A is the difference between two successive 
quantization levels and is given by: 

2 



M-1 



where M is the number of quantization levels used. 

Generally, if the input signal is within the quantizer-input range and if M is 
large enough, it can be shown for a very large class of signals that the quantization error s is 
approximately uniformly distributed having a variance of: 



12 



For each frame of an audio fragment and for every sub-band, a group of 12 sub-band samples 
are first normalized to [-1,+1] resulting in 32 scale factors scfi, one for each sub-band /. The 
energy of the noise levels for each sub-band / can now be estimated as: 

<^lj = — -y^^ Equation 1 



This can be calculated for left and right channels and for all sub-bands. Thus, 
the noise levels for the fragment 22 if encoded in say an MPEG-1 Layer I encoder are 
indicated by the line 26. It can be seen that for the frequency ranges 28, 28* and 28" these 

noise levels exceed the masking threshold 24 and so it is assumed tiiat some distortion may 
audible -^^.^en m iiie oiiginiliy encoded audio ira^iianc. 
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However, when post-processing such lossy-encoded audio-fiagments, the 
post-processed quantization noise may further exceed the masking threshold of the post- 
processed fiagment. As can be seen &om the range 30 in Figure 3(b), the noise level 
indicated by the line 26' exceeds the masking threshold 24' of the post-processed signal 
mdicated by the line 22' across a large frequency range and by a significant amount. Thus, 
Figure 3(b) shows a significant rise in audible noise levels - compared to that of the coded 
firagment of Figure 3(a) - between approximately [5,15] Bark which is approximately equal to 
[500,5000] Hz. 

As mentioned previously, the original ftagment is assumed not to be available 
in the detection process. Therefore, the actual masked thresholds and quantization noise 
levels of the coded and post-processed fiagments are not available. However, these two 
quantities can be estimated from the bit-stream of tihe coded fiagment (AS). 

Turning now to the estunadon of the maskmg thresdbtold 24' and the noise level 
26'. In one variation of the first embodiment, a psycho-acoustic modelmg component 20 
generates an estimate for the masking threshold Mt for each fi-ame fi:om a post-processed 
channel 16. In the case of Incredible Sound post-processing, most of the processing affects 
the difference channel and so the amount of energy in the difference channel determines the 
amount of audible quantization noise after post-processing stereo-encoded firagments. Thus, 
the PCM data for each fiagment of the difference chamiel is Fourier transformed by the 
psycho-acoustic modeling component 20 to provide a firequency spectrum for the post- 
processed firagment of the type shown by the line 22' in Figure 3(b). The estimate of the 
masking threshold Mt indicated by the Ime 24' is then calculated fiom the spectrum 22' m a 
conventional manner and provided to the detector 17. 

An estimate of the noise level &l for tiie post-processed firagment is derived in 
the detector 17 by first estimating the noise levels for tiie original j&agment from the encoded 
bitstream (AS) usmg the quantization level mformation provided in the bitsti^am and 
Equation 1 . Then, knowing the type of post-processmg to be performed on the decoded 
signal, tiie detector 17 can perfi>rm the same post-processing on tiie estimated noise levels for 
tiie origmal firagment to provide tiie estimate of tiie noise level for the post-processed 
firagment a^. 

The detector 17 tiien provides a measure of the amount of distortion D in the 
post-processed signal by integrating tiie estimated amount noise level 26' in tiie post- 
processed signal exceedmg tiie masking tiireshold 24' for tiiose frequencies for which 
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quantization noise is audible on a firame-by-firame basis, i.e. the distortion measurement D is 
equal to: 



0, otherwise 



where i is the sub-band number and n a penalize-index. The higher n, the more the distortion 
is penalized. For a sampling frequency of 48 kHz, range i=[l,5] is equal to [750,4500] Hz 
which is approximately the range where quantization noise is audible after post-processing. 
Then, on the basis of the distortion measurement D, the regulator 1 8 can then decide to take 
1 0 action against audible quantization noise. 

An improved distortion measurement would, for example, also examine the 
durations of noise exceeding the masked threshold. The longer these durations, the more 
likely that quantization noise will become audible. This however is more complex than the 
simple distortion measurement D above. 
15 It will be seen that using this first variation of the first embodiment, the 

regulator 1 8 will tend to allow audible distortion to occur before taking corrective action. In 
such cases, the system would need to have a desired level of post-processing so that if the 
level of post-processing is dropped for a particular frame or fragment, it can be incrementally 
incareased thereafter towards the target value until a lessening correction is required again. 
20 In a second variation of the preferred embodiment. Figure 4, a variant of the 

psycho-acoustic modeling component 20' draws the signal energy level data from the 
bitstream AS. As in the first variation in relation to noise, knowing the type of post- 
processing to be performed on the decoded signal, the component 20' can perform the same 
processing on the original fragment to provide a frequency spectrum estimate of the post- 
25 processed signal as indicated by the line 22' in Figure 3(b). The masking threshold 24' can 
then be calculated for this estimated signal and this can be passed to the detector 17 as before 
to enable the detector 17 to generate an estimate of the distortion D to be produced with the 
current level of post-processing. The detector 17 may then pass this distortion measurement 
D to the regulator 18 which can reduce the level of post-processing to be performed on the 
30 fragment for which the distortion estimate has been made. For example, for Incredible Soimd 
post-processing the factor a is lowered for hiidi values of D. 
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second embodiment of the invention. Figure 5, only the decoded audio channels 14 are 
available and so no decoder 10 is employed. In S. Moehrs, Jurgen Herre and Ralf Geiger, 
"Analyzing decompressed audio with tiie "Inverse Decoder"- towards an operative 
algorithm". Convention Paper 5576 of the 1 12th Convention of the AES, 2002 May 10-13, 
5 MuDflch, and J. Herre and M. Schug, "Analysis of decompressed audio - The inverse 
decoder", Convention Paper 5256 of the 109fh AES Convention, Los Angeles, 2000 an 
inverse decoder 10' is described. This enables the quantization levels for a fragment to be 
detected from the PCM domain signal. Thus, in the second embodiment, the inverse decoder 
10' provides this information to a variation of the detector 17\ The detector 17' first 

10 estimates tiie noise levels for the original fragment and then processes these as before to 
provide an estimate of the noise levels hi tiie post-processed fragment. In Figure 5, the 
psycho-acoustic modeling component 20 draws its data from the post-processed chaimels 16 
as in Figure 1 to generate the masking tiireshold for the fragment which it provides to the 
detector 17\ Using this masking threshold and tiie noise levels, the detector can generate the* 

1 5 distortion measure D as before. 

It will be seen from tiie description above that in the preferred embodiments, . 
imwanted artefacts are prevented from becoming audible in the output channels 16 while the 
audio bitstream AS is being decoded and post-processed in real-time. 

In the preferred embodiments, the amount of post-processing applied is 

20 lessened or even completely disabled by the regulator 18. This is generally applicable to all 
post-processing techniques that add a certain amount of the processed signal to a certain 
amount of the orighial signal. 

Another example of tiie regulation of post-processing independent of the use 
of noise levels or a masking threshold is to determine a as a fimctionJ^(Z, - Ri)/d) where 

25 fO is some monotonic ftmction varying between 0 and 1 for the argument of fQ varsdng from 
0 to a maximum and d=A*jc^. The means that if the difference between a left and right 
channel sub-band signal is small, it is preferable not to boost the signal too much. 

In the preferred embodiments, the channels 14 and 16 are described as stereo 
channels. However, it will be seen that the invention is also applicable to more than two 

30 channels and also that the invention is not restricted to the number of chamiels 1 4 and 1 6 
being the same. 

In the preferred embodiments, the regulator 1 8 controls the post-processor 12 
with a single parameter a. It will be seen that the invention is extendible to controlling many 
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parameters of the post-processor. For example, in the case of the preferred embodiments, a 
vector of a, could be used to control the post-processing of each sub-band /. 

In the preferred embodiments, it is assumed that the detector 17, 17' can 
estimate the post-processing carried out by the processor 12, as mdicated by the Une joining 
the components. The invention is therefore not restricted to estimating the effect of post- 
processing by a strictly defined process such as Incredible Sound. For example, the complete 
path from the decoder output channels 14 to a human ear including for example, amplifiers, 
loudspeakers and headphones can be modeled as a post-processor signal path. In the case of 
the preferred embodiments, this model can be appUed to the calculated noise levels and/or 
masking thresholds to determine the degree to which the complete post-processing signal 
path makes quantization noise audible. Where the noise becomes excessively audible, the 
regulator can control some aspect of the post-processing signal path to reduce this noise, for 
example, by lowering the output volume of a loudspeaker slightly or adjusting the 
equalization of an amplifier. 

It should be noted that the above-mentioned embodiments illustrate rather than 
limit the invention, and that those skilled in the art will be able to design many alternative 
embodiments without departing from the scope of the appended claims. In the claims, any 
reference signs placed between parentheses shall not be construed as limiting the claim. The 
word 'comprising' does not exclude the presence of other elements or steps than those listed 
in a claim. The invention can be implemented by means of hardware comprising several 
distinct elements, and by means of a suitably programmed computer. In a device claim 
enumerating several means, several of these means can be embodied by one and the same 
item of hardware. The mere fact that certain measures are recited in mutuaUy different 
dependent claims does not indicate that a combmation of fliese measures cannot be used to 
advantage. 
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1 . An audio system comprising: 

a post-processor arranged to alter successive fragments of a decoded audio signal to provide 
successive fragments of post-processed audio signal; 

a distortion detector for determining a degree to which quantization noise introduced in 
5 encoding said successive fragments of audio signal becomes audible due to said post- 
processing; and 

a regulator arranged to control said post-processor according to said degree. 

2. An audio system as claimed in claim 1 further comprising: 

10 a masking threshold gaierator arranged to provide an estimate of a masking threshold for 
said successive fragments of post-processed audio signal; 

a noise level detector arranged to provide an estimate of a noise level for said successive 
fragments of said post-processed audio signal; 

and wherein said distortion detector determines said degree according to the degree to which 
1 5 said noise level exceeds said masking threshold for successive fragments of said post- 
processed audio signal. .... 

3. An audio system as claimed in claim 2 further comprising a decoder arranged 
to read an audio stream and to produce said successive fragments of audio signal* 

20 

4. An audio system as claimed in claim 3 wherein said decoder produces stereo- 
encoded successive pairs of firagments of audio signal and said post-processor applies stereo- 
widening to said successive pairs of fragments of audio signal. 

25 5. An audio system as claimed in claim 2 wherein said masking threshold 

generator comprises a psycho-acoustic modeling component arranged to transform said . 
successive fragments of post-processed audio signal into the frequency domain; and to derive 
said masking threshold therefrom. 
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^' An audio system as claimed in claim 2 wherein said masking threshold 

generator comprises a psycho-acoustic modeling component arranged to read said audio 
stream and to produce successive fi-agments of audio signal; to apply similar post-processing 
to said successive ftagments of audio signal as said post-processor; to transform said 
successive post-processed fragments of audio signal into the frequency domain; and to derive 
said masking threshold from said post-processed signal. 

7. An audio system as claimed in claim 2 fiirther comprising an inverse decoder 
arranged to read said successive fragments of a decoded audio signal and to provide 
therefrom indications of quantization levels employed in the encoding of an audio stream 
from which said audio signal is decoded. 

8. An audio system as claimed in claim 3 in which said noise level detector is 
arranged to derive from said audio stream quantization levels employed in the encoding of an 
audio stream. 



^- An audio system as claimed in claim 7 or 8 in which said noise level detector 

is arranged to derive from said quantization levels a distribution of noise level in the 
frequency domain for said successive fragments of a decoded audio signal, and to apply 
similar post-processing to said successive distributions of noise level as said post-processor 
to provide successive estimates of noise level for said successive fragments of said post- 
processed audio signal. 

10. A method of processing an audio stream comprising the steps of: 

post-processmg successive fragments of a decoded audio signal to provide successive 
fragments of post-processed audio signal; 

detecting a degree to which quantization noise mtroduced in encoding said successive 
fragments of audio signal becomes audible due to said post-processing; and 
regulating said post-processing step according to said degree. 
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ABSTRACT: « g. 07. 2002 

An audio system comprises a post-processor (12) arranged to alter successive 
fragments of a decoded audio signal (14) to provide successive fragments of post-processed 
audio signal (16). A masking threshold generator (20) provides an estimate of a masking 
threshold (M) for successive fragments of post-processed audio signal (16). A noise level 
generator (17) provides an estimate of a noise level (cr^) for successive fragments of the 
post-processed audio signal (16). A distortion generator (17) determmes a degree (D) to 
which the noise level exceeds the masking threshold for successive fragments of the post- 
processed audio signal (16). A regulator (18) controls the post-processor according to Ihe 
degree to which the noise levels exceed the maddng threshold. 



Fig. 2 
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